Goto

Collaborating Authors

 oob sample


comments, we organize our responses as follows

Neural Information Processing Systems

We thank the reviewers for their valuable feedback that will significantly improve our paper. This is indeed a limitation of Theorem 1. The CHIP data included in our simulation studies shows that MDI-oob works in this setting. We plan to add this plot in our supplementary material. Reviewers 2 and 3: Give theoretical/empirical evidence that MDI-oob can "debias" MDI. Empirically, we compute the MDI-oob for the first simulation.


We thank all the reviewers for their insightful comments, suggestions, and references

Neural Information Processing Systems

We thank all the reviewers for their insightful comments, suggestions, and references. Novelty of tandem loss: it is not new, but we were not aware of the prior work, we thank Reviewer 2 for bringing it up. While most of the computed bounds are non-vacuous, they look to be not that tight. Also a discussion of potential ways to obtain tighter bond values, or whether there is a fundamental limitation. We provide some discussion in Sections 3.2 and 4.4.


comments, we organize our responses as follows

Neural Information Processing Systems

We thank the reviewers for their valuable feedback that will significantly improve our paper. This is indeed a limitation of Theorem 1. The CHIP data included in our simulation studies shows that MDI-oob works in this setting. We plan to add this plot in our supplementary material. Reviewers 2 and 3: Give theoretical/empirical evidence that MDI-oob can "debias" MDI. Empirically, we compute the MDI-oob for the first simulation.


We thank all the reviewers for their insightful comments, suggestions, and references

Neural Information Processing Systems

We thank all the reviewers for their insightful comments, suggestions, and references. Novelty of tandem loss: it is not new, but we were not aware of the prior work, we thank Reviewer 2 for bringing it up. While most of the computed bounds are non-vacuous, they look to be not that tight. Also a discussion of potential ways to obtain tighter bond values, or whether there is a fundamental limitation. We provide some discussion in Sections 3.2 and 4.4.


Measuring the Algorithmic Convergence of Randomized Ensembles: The Regression Setting

Lopes, Miles E., Wu, Suofei, Lee, Thomas C. M.

arXiv.org Machine Learning

When randomized ensemble methods such as bagging and random forests are implemented, a basic question arises: Is the ensemble large enough? In particular, the practitioner desires a rigorous guarantee that a given ensemble will perform nearly as well as an ideal infinite ensemble (trained on the same data). The purpose of the current paper is to develop a bootstrap method for solving this problem in the context of regression --- which complements our companion paper in the context of classification (Lopes 2019). In contrast to the classification setting, the current paper shows that theoretical guarantees for the proposed bootstrap can be established under much weaker assumptions. In addition, we illustrate the flexibility of the method by showing how it can be adapted to measure algorithmic convergence for variable selection. Lastly, we provide numerical results demonstrating that the method works well in a range of situations.


Cost-complexity pruning of random forests

Ravi, Kiran Bangalore, Serra, Jean

arXiv.org Machine Learning

Random forests perform bootstrap-aggregation by sampling the training samples with replacement. This enables the evaluation of out-of-bag error which serves as a internal cross-validation mechanism. Our motivation lies in using the unsampled training samples to improve each decision tree in the ensemble. We study the effect of using the out-of-bag samples to improve the generalization error first of the decision trees and second the random forest by post-pruning. A preliminary empirical study on four UCI repository datasets show consistent decrease in the size of the forests without considerable loss in accuracy.